The Best 610 Speech Synthesis Tools in 2025

Kokoro 82M
Apache-2.0
Kokoro is an open-source text-to-speech (TTS) model with 82 million parameters, renowned for its lightweight architecture and high audio quality, while also being fast and cost-effective.
Speech Synthesis English
K
hexgrad
2.0M
4,155
XTTS V2
Other
ⓍTTS is a revolutionary voice generation model that achieves cross-lingual voice cloning with just a 6-second audio clip, supporting 17 languages.
Speech Synthesis
X
coqui
1.7M
2,630
F5 TTS
F5-TTS is a flow matching-based voice synthesis model, focusing on fluent and faithful voice synthesis, especially suitable for scenarios like fairy tale narration.
Speech Synthesis
F
SWivid
851.49k
1,000
Bigvgan V2 22khz 80band 256x
MIT
BigVGAN is a general-purpose neural vocoder trained at scale, capable of generating high-quality audio waveforms from mel spectrograms.
Speech Synthesis
B
nvidia
503.23k
16
Speecht5 Tts
MIT
A SpeechT5 speech synthesis (text-to-speech) model fine-tuned on the LibriTTS dataset, supporting high-quality text-to-speech conversion.
Speech Synthesis Transformers
S
microsoft
113.83k
760
Dia 1.6B
Apache-2.0
Dia is a 1.6 billion parameter text-to-speech model developed by Nari Labs, capable of generating highly realistic conversations directly from text, supporting emotional and tonal control, and producing non-verbal communication content.
Speech Synthesis Safetensors English
D
nari-labs
80.28k
1,380
Csm 1b
Apache-2.0
CSM is a 1-billion-parameter voice generation model developed by Sesame, capable of generating RVQ audio encoding from text and audio inputs
Speech Synthesis English
C
sesame
65.03k
1,950
Kokoro 82M V1.1 Zh
Apache-2.0
Kokoro is an open-weight series of small yet powerful text-to-speech (TTS) models, now featuring data from 100 Chinese speakers sourced from professional datasets.
Speech Synthesis
K
hexgrad
51.56k
112
Indic Parler Tts
Apache-2.0
Indic Parler-TTS is a multilingual extension of Parler-TTS Mini, supporting 21 languages including various Indian languages and English.
Speech Synthesis Transformers Supports Multiple Languages
I
ai4bharat
43.59k
124
Bark
MIT
Bark is a Transformer-based text-to-audio model created by Suno, capable of generating highly realistic multilingual speech, music, background noise, and simple sound effects.
Speech Synthesis Transformers Supports Multiple Languages
B
suno
35.72k
1,326
E2 TTS
F5-TTS is a fully non-autoregressive zero-shot text-to-speech model that supports high-quality speech synthesis.
Speech Synthesis
E
SWivid
32.58k
48
Xcodec2
XCodec2 is a voice tokenizer supporting multilingual voice semantic understanding and high-quality voice reconstruction
Speech Synthesis
X
HKUSTAudio
32.36k
67
Parler Tts Large V1
Apache-2.0
A 2.2 billion parameter text-to-speech model trained on 45,000 hours of audio data, supporting voice feature control via text prompts
Speech Synthesis Transformers English
P
parler-tts
28.69k
252
Mms Tts Eng
English text-to-speech model developed by Meta, based on the VITS architecture, supporting high-quality speech synthesis
Speech Synthesis Transformers
M
facebook
28.60k
146
Bark Small
MIT
Bark is a Transformer-based multilingual text-to-audio model developed by Suno, capable of generating realistic speech, music, and non-verbal sounds
Speech Synthesis Transformers Supports Multiple Languages
B
suno
22.74k
201
Mms Tts Yor
A Yoruba text-to-speech model developed by Meta, based on the VITS architecture for high-quality speech synthesis
Speech Synthesis Transformers
M
facebook
17.88k
19
Parler Tts Mini V1
Apache-2.0
Lightweight text-to-speech model trained on 45,000 hours of audio, supporting voice characteristic control via text prompts
Speech Synthesis Transformers English
P
parler-tts
14.16k
143
Orpheus 3b 0.1 Ft Q4 K M GGUF
Apache-2.0
Orpheus-TTS is a lightweight text-to-speech model that supports local operation, providing high-quality speech synthesis capabilities.
Speech Synthesis English
O
isaiahbjork
13.43k
48
Bruce
This is an RVC (Retrieval-based Voice Conversion) model designed for audio-to-audio tasks, capable of converting input audio into output audio with a specific style.
Speech Synthesis Transformers
B
sail-rvc
11.79k
0
Homersimpson2333333
This is a voice conversion model based on RVC (Retrieval-Based Voice Conversion) technology, capable of transforming input audio into the voice style of Homer Simpson.
Speech Synthesis Transformers
H
sail-rvc
11.36k
1
Freddie Mercury RVC 700 Epochs
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, trained for 700 epochs, capable of converting input audio into Freddie Mercury-style speech.
Speech Synthesis Transformers
F
sail-rvc
8,750
1
Lana Del Rey E1000 S13000
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of converting input audio into speech with a specific style.
Speech Synthesis Transformers
L
sail-rvc
8,707
1
Adele RVC 400 Epochs
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, trained for 400 rounds, capable of converting input audio into output audio that mimics Adele's vocal timbre.
Speech Synthesis Transformers
A
sail-rvc
8,267
0
Xxxtentacion
This is an audio-to-audio conversion model based on the RVC architecture, specifically designed for processing XXXTentacion-style voice conversion.
Speech Synthesis Transformers
X
sail-rvc
7,984
0
Xphonebert Base
MIT
XPhoneBERT is the first multilingual phoneme representation pretraining model for text-to-speech (TTS), based on the BERT-base architecture and trained with 330 million phoneme-level sentences across nearly 100 languages.
Speech Synthesis Transformers
X
vinai
7,561
15
Indicf5
IndicF5 is a near-human multilingual text-to-speech (TTS) model trained on 1,417 hours of high-quality speech data, supporting 11 Indian languages.
Speech Synthesis Other
I
ai4bharat
6,595
37
Michaeljackson
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of transforming input audio into Michael Jackson-style speech.
Speech Synthesis Transformers
M
sail-rvc
6,250
0
Shrek
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of converting source speech into a target voice style.
Speech Synthesis Transformers
S
sail-rvc
5,919
2
Eminem E600 S5400
This is a voice conversion model based on RVC (Retrieval-Based Voice Conversion) technology, capable of transforming input audio into speech output with a specific style.
Speech Synthesis Transformers
E
sail-rvc
5,639
1
XTTS V1
Other
ⓍTTS is a voice generation model that can clone voices and apply them to different languages with just a 6-second audio clip.
Speech Synthesis
X
coqui
5,449
369
Parler Tts Mini V0.1
Apache-2.0
Parler-TTS Mini is a lightweight text-to-speech model trained on 10.5K hours of audio data, supporting voice feature control through text prompts.
Speech Synthesis Transformers English
P
parler-tts
5,430
352
Ariana Grande RVC V1
This is a voice conversion model based on RVC (Retrieval-Based Voice Conversion) technology, capable of transforming input audio into Ariana Grande-style speech.
Speech Synthesis Transformers
A
sail-rvc
5,404
2
F15
Fish Speech V1.5 is a leading text-to-speech (TTS) model trained on over 1 million hours of multilingual audio data.
Speech Synthesis Supports Multiple Languages
F
cocktailpeanut
5,162
0
Csm 1b
Apache-2.0
CSM is a 1B-parameter speech generation model developed by Sesame, capable of generating RVQ audio codes from text and audio inputs, supporting context-aware speech generation.
Speech Synthesis English
C
eustlb
5,144
3
Drake RVC
Drake_RVC is an audio-to-audio model based on RVC (Retrieval-based Voice Conversion) technology, specifically designed for voice conversion tasks.
Speech Synthesis Transformers
D
sail-rvc
5,043
1
Tts Hifigan
HiFiGAN is a Generative Adversarial Network (GAN) model capable of generating high-quality audio from mel-spectrograms, suitable for text-to-speech systems.
Speech Synthesis English
T
nvidia
5,022
36
Alvin
This is an RVC (Retrieval-based Voice Conversion) model designed for audio-to-audio conversion tasks.
Speech Synthesis Transformers
A
sail-rvc
4,909
0
Billie Eilish
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of transforming input audio into output audio that mimics Billie Eilish's voice.
Speech Synthesis Transformers
B
sail-rvc
4,899
2
Tts En Fastpitch
FastPitch is a fully parallel Transformer-based text-to-speech model capable of controlling pitch and phoneme duration, generating high-quality American English speech.
Speech Synthesis English
T
nvidia
4,701
38
Mms Tts Fra
A French text-to-speech model developed by Meta, based on the VITS architecture, supporting high-quality speech synthesis
Speech Synthesis Transformers
M
facebook
4,667
8
Justinbiebermw
This is an audio conversion model based on RVC (Retrieval-Based Voice Conversion) technology, specifically designed to transform input audio into Justin Bieber's vocal style.
Speech Synthesis Transformers
J
sail-rvc
4,656
0
Frank Sinatra 51600 Steps 250 Epochs RVC
This is an audio-to-audio conversion model based on the RVC framework, specifically designed for voice conversion tasks.
Speech Synthesis Transformers
F
sail-rvc
4,590
0
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase